Search CORE

397 research outputs found

Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment

Author: Bocklet Tobias
Martens Jean-Pierre
Middag Catherine
Nöth Elmar
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/01/2011
Field of study

Intelligibility is widely used to measure the severity of articulatory problems in pathological speech. Recently, a number of automatic intelligibility assessment tools have been developed. Most of them use automatic speech recognizers (ASR) to compare the patient's utterance with the target text. These methods are bound to one language and tend to be less accurate when speakers hesitate or make reading errors. To circumvent these problems, two different ASR-free methods were developed over the last few years, only making use of the acoustic or phonological properties of the utterance. In this paper, we demonstrate that these ASR-free techniques are also able to predict intelligibility in other languages. Moreover, they show to be complementary, resulting in even better intelligibility predictions when both methods are combined

Ghent University Academic Bibliography

How to repair speech repairs in an end-to-end system

Author: Batliner Anton
Nöth Elmar
Spilker Jörg
Publication venue
Publication date: 29/01/2020
Field of study

OPUS Augsburg

Pinpointing the difference - visual comparison of non-native speaker groups

Author: Batliner Anton
Hönig Florian
Nöth Elmar
Wankerl Sebastian
Publication venue
Publication date: 28/02/2020
Field of study

OPUS Augsburg

The role of previous generations in the just savings principle

Author: Batliner Anton
Nöth Elmar
Publication venue: Cambridge University Press
Publication date: 31/12/2019
Field of study

OPUS Augsburg

Online Research @ Cardiff

Topic spotting using subword units

Author: Harbeck Stefan
Niemann Heinrich
Nöth Elmar
Warnke V.
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1997
Field of study

In this paper we present a new approach for topic spotting based on subword units and feature vectors instead of words. In our first approach, we only use vector quantized feature vectors and polygram language models for topic representation. In the second approach, we use phonemes instead of the vector quantized feature vectors and model the topics again using polygram language models. We trained and tested the two methods on two different corpora. The first is a part of a media corpus which contains data from TV shows for three different topics. The second is the VERBMOBIL-corpus where we used 18 dialog acts as topics. Each corpus was splitted into disjunctive test and training sets. We achieved recognition rates up to 82% for the three topics of the media corpus and up to 64% using 18 dialog acts of the VERBMOBIL-corpus as topics

CiteSeerX

Universaar

Acronym

How many labellers? Modelling inter-labeller agreement and system performance for the automatic assessment of non-native prosody

Author: Batliner Anton
Hönig Florian
Nöth Elmar
Weilhammer Karl
Publication venue
Publication date: 12/02/2020
Field of study

OPUS Augsburg

Why sentence modality in spontaneous speech is more difficult to classify and why this fact is not too bad for prosody

Author: Batliner Anton
Kießling Andreas
Nöth Elmar
Weiland C.
Publication venue
Publication date: 09/01/2020
Field of study

OPUS Augsburg

Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

Author: Bayerl Sebastian P.
Nöth Elmar
Riedhammer Korbinian
Wagner Dominik
Publication venue: 'International Speech Communication Association'
Publication date: 16/06/2022
Field of study

Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 [1] for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on FluencyBank , [2] and the German therapy-centric Kassel State of Fluency (KSoF) [3] dataset by training Support Vector Machine classifiers using features extracted from the finetuned models for six different stuttering-related event types: blocks, prolongations, sound repetitions, word repetitions, interjections, and - specific to therapy - speech modifications. Using embeddings from the fine-tuned models leads to relative classification performance gains up to 27% w.r.t. F1-score.Comment: Accepted at Interspeech 202

arXiv.org e-Print Archive

Syntactic-prosodic labeling of large spontaneous speech data-bases

Author: Batliner Anton
Kießling Andreas
Kompe Ralf
Niemann Heinrich
Nöth Elmar
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large databases are necessary. For the German Verbmobil project (automatic speech-to-speech translation), we developed a syntactic-prosodic labeling scheme where two main types of boundaries (major syntactic boundaries and syntactically ambiguous boundaries) and some other special boundaries are labeled for a large Verbmobil spontaneous speech corpus. We compare the results of classifiers (multilayer perceptrons and language models) trained on these syntactic-prosodic boundary labels with classifiers trained on perceptual-prosodic and pure syntactic labels. The main advantage of the rough syntactic-prosodic labels presented in this paper is that large amounts of data could be labeled within a short time. Therefore, the classifiers trained with these labels turned out to be superior (recognition rates of up to 96%)

CiteSeerX

Universaar

Acronym

Classification of boundaries and accents in spontaneous speech

Author: Batliner Anton
Kießling Andreas
Kompe Ralf
Niemann Heinrich
Nöth Elmar
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

CiteSeerX